DOCUMENT RESUME 



ED 401 309 



TM 025 870 



AUTHOR 

TITLE 

PUB DATE 
NOTE 



PUB TYPE 



Myford, Carol M. ; And Others 

Constructing Scoring Rubrics: Using **Facets** To Study 
Design Features of Descriptive Graphic Rating Scales* 
Apr 96 — ^ » 

61p«; Paper presented at the Annual Meeting of the 
American Educational Research Association (New York, 
NY, April 8-12, 1996). 

Reports - Evaluative/Feasibility (142) — 

Speeches /Conference Papers (150) 



EDRS PRICE 
DESCRIPTORS 

IDENTIFIERS 



MF01/PC03 Plus Postage. 

^Evaluators ; *Rat ing Scales ; ^Scoring; ^Student 
Evaluation; *Test Construction; Test Use; Visual 
Arts 

^FACETS Computer Program; FACETS Model; National 
Assessment of Educational Progress; ^Scoring 
Rubrics 



ABSTRACT 

Developing scoring rubrics to evaluate student work 
was studied, concentrating on the use of intermediate points in 
rating scales. How scales that allow for intermediate points between 
defined categories should be constructed and used was explored. In 
the recent National Assessment of Educational Progress (NAEP) visual 
arts field test, researchers experimented with several formats for 
constructing scoring rubrics. Some descriptive graphic rating scales 
(continuous score scales) were pilot tested by 11 raters who scored 
the NAEP visual arts test for grades 4 and 8. Descriptive graphic 
ratings were designed to evaluate 4 test production blocks from the 
assessment, for a total of 50 pieces of student work. The "Facets" 
computer software was used to analyze the rating data. Raters were 
able to use the descriptive rating scales rel iably • . Some of the 
constructed scales were able to support. 7 to 10 rating points rather 
than the traditional 3 or 4 points. However there ‘“wa’s little 
appreciable gain in reliability for scales having more than five 
points. The particular features of the scale (such as defined 
midpoint) were not as important as the knowledge, skills, and 
motivation of the rater. An appendix contains the graphic rating 
scales. (Contains 2 figures, 11 tables, and 32 references.) (SLD) 



* Vc * * * * Vc * * * * Vc * * Vc * * * Vc A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A Vc A A A A A A A A A A 

^ Reproductions supplied by EDRS are the best that can be made 

* from the original document . 

A A A A A A A A Vc A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A 



Constructing Scoring Rubrics: 
Using Facets to Study Design Features of 
Descriptive Graphic Rating Scales 












On 

O 

cn 



o 



Q 



w 



U.S. DEPARTMENT OF EDUCATION 
Office^ Educational Research and Improvement 
EDUC^IONAL RESOURCES INFORMATION 
/ CENTER (ERIC) 

nr This document has been reproduced as 
received from the person or organization 
originating it. 



PERMISSION TO REPRODUCE AND 
DISSEMINATE THIS MATERIAL 
HAS BEEN GRANTED BY 

CA/et>L 



□ Minor changes have been made to 
improve reproduction quality. 



Points of view or opinions stated in this 
document do not necessarily represent 
official OERI position or policy. 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC) 



Carol M. Myford 
Eugene Johnson 
Ray Wilkins 
Hilary Persky 
Mary Michaels 



BEST COPY AVAILABLE 







Educational Testing Service 
Princeton, NJ 



Paper presented at the annual meeting of the American Educational Research 

Association, April 1996, New York 



O 



2 



Page 2 



€ 



> 



How should one go about the task of devising scoring rubrics to evaluate 
students' work? What format should those rubrics take? While there are resources 
available that provide practical advice for constructing rubrics (e.g., Airasian, 1991; 
Herman, Aschbacher & Winters, 1992; Linn & Gronlund, 1995; Stiggins, 1987), such 
resources do not address the issue of whether certain rubric formats are 
psychometrically superior to others. In several large-scale performance assessment 
programs (i.e., Vermont, Kentucky, and California statewide assessment programs; 
Pittsburgh Arts PROPEL), 3- or 4-point rubrics are commonly used. When 
constructing these rubrics, assessment developers typically identify specific 
observable aspects of the performance and/or product that are to be evaluated (i.e., 
the performance criteria). They then devise rating scales for the criteria, each scale 
containing three or four categories. The qualities or characteristics of a response 
associated with each category are defined in narrative form as precisely as possible. 
Raters use these narrative descriptions of qualities or characteristics to decide which 
rating to assign. 

A thorny problem inevitably arises when raters use such rubrics: What rating 
should a rater give if a student's work falls "in the cracks" between the defined 
categories of the scale? For example, let's suppose the rater judges the student's 
work to be clearly better than the qualities or characteristics of performance described 
as a "level 1" response but not as good as the qualities or characteristics described as a 
"level 2" response. How should the rater handle this situation when it arises? In 
their recent review of the statistical procedures used in the California Learning 
Assessment System, Cronbach, Bradbum, and Horvitz (1994) suggest that raters be 
encouraged to use intermediate scale values for students' "borderline" responses. In 
their view, if raters were allowed to use intermediate values, the accuracy of the 
rating process could be improved, thus reducing a source of measurement error. 

They recommend that, at the very least, raters should be encouraged to use 
midpoints between the defined categories (i.e., for a scale with three defined 
categories labeled 1, 2, and 3, the raters should also be allowed to use the 
intermediate points 1.5 and 2.5 if they so desired). As a further refinement, they 
advocate allowing not only the use of midpoints but also points on either side of the 
midpoint (i.e., not only 2.5 but also 2.4 if the rater were leaning more toward a 2 
than a 3, and 2.6 if the rater were leaning more toward a 3 than a 2). 




3 



Page 3 



If we decide to follow the advice of Cronbach et al. and allow for intermediate 
points in our rating scales, there are some important decisions we must make: 

What formats should we use when we construct these scales? Do some rating 
formats have better track records than others? How many intermediate points 
should our scales have? Is there an optimum number? How can we tell whether 
our scales have too few, too many, or the right number of points? We turn to a brief 
review of the literature on rating formats and the literature on the relationship 
between number of response categories and reliability to help us answer these 
questions. 

Comparative Studies of Rating Formats. Surprisingly few studies comparing 
rating formats have been carried out in education-related settings. However, this 
has been a focus of much research and heated debate in personnel/organizational 
psychology since the 1950's. There is quite an extensive literature of comparative 
studies of various rating formats (e.g., behaviorally anchored rating scales (BARS), 
forced-choice formats, mixed standard scales, graphic rating scales). Reviewers of 
this literature have generally agreed that no one format has emerged as clearly 
superior to the others. In Tandy and Farr's (1983) words. 

After more than 30 years of serious research, it seems that little progress has 
been made in developing an efficient and psychometrically sound alternative 
to the traditional graphic rating scale. ... It appears likely that greater progress 
in understanding performance judgments will come from research on the 
rating process than from a continued search for the "Holy Format." (p. 90) 

.As Guion (1986) suggests, of more importance than the particular rating format used 
is the competence of the rater. Cronbach (1990) echoes similar sentiments: ’’Many 
reporting formats and scoring systems for ratings have been tried. On the whole, it 
appears that the knowledge and motivation of the informant affect validity more 
than do features of the scale" (p. 587). 

Number of Response Categories and Reliability. After conducting a series of 
studies examining the effect of number of categories on reliability of ratings, Bendig 
(1952a, 1952b, 1953, 1954a, 1954b) concluded that the reliability of the scales he 
employed did not increase as the number of scale categories increased from 5 to 9. 

He found that reliability decreased for scales having 3 or fewer categories and for 
scales having 11 or more categories. When Firm (1972) examined the reliability of 




4 



Page 4 



the scales he used, he concluded that reliability dropped with fewer than 3 or with 
more than 7 categories. In their Monte Carlo studies of factors affecting rating 
reliability, Lissitz and Green (1975) and Jenkins and Taber (1977) agreed that there 

was little appreciable gain in reliability when the number of scale categories 
exceeded 5. 

Based on their review of the findings from these studies, Tandy and Farr 
(1983) recommended that rating scales not include more than 9 categories since "the 
weight of evidence suggests that individuals have limited capacities for dealing with 
simultaneous categories of heterogeneous information. This was suggested long 
ago by Miller (1956) in his now-famous 'seven, plus or minus two' dictum, and 
appears to generalize to rating behavior" (pp. 83-84). Somewhat tongue-in-cheek, 
Guion (1986) quips, "it sometimes seems as if the five-point scale has been decreed 
from heaven, but there are other options" (p. 349). Indeed, when one compares the 
recommendations of educational measurement experts, one finds substantial 
differences of opinion regarding optimal number of scale points. Liim and 
Gronlund (1995) recommend that scales use between 3 and 7 scale points, while 
Cronbach (1990) recommends 4- to 7-point scales. Mehrens and Lehmann (1991) 
suggest that a maximum of 10 points be used, but they believe that 5- to 7-point 
scales are appropriate for most purposes. By contrast, Payne (1992) suggests that the 
optimal number of categories is probably 7 to 9, but, depending on the nature of the 
task and sophistication level of the raters, scales having as many as 7 to 20 categories 
could appropriately be used, Payne asserts. 

Background of the Project 



In the recent National Assessment of Educational Progress (NAEP) visual arts 
field test, we experimented with several different formats for constructing scoring 
rubrics. Past NAEP assessments in other content areas have typically made use of 
rubrics laid out as 3- or 4-point rating scales. In scoring these assessments, it has not 
been uncommon for raters to encounter examples of students' works that are 
difficult to rate. Certain works often do not appear to fit into any of the given 
categories of a scale but, rather, seem to lie "in the cracks" between the defined 
categories. We decided to pilot some descriptive graphic rating scales (i.e., 
continuous score scales) as part of the NAEP visual arts field test so that we could 




5 



Page 5 



leam about how raters would use intermediate score points when they have that 
option. 

A descriptive graphic rating scale has two defined endpoints. These points 
are cormected by a horizontal line. Descriptive phrases identify different points 
along the continuum. For some scales, the descriptive phrases might be quite brief, 
while for other scales the phrases might be more extensive. When a rater uses the 
scale to evaluate a product or a performance, the rater makes a vertical slash along 
the line to indicate where along that continuum the work lies. Descriptive graphic 
rating scales can incorporate different design features (i.e., presence or absence of a 
defined midpoint, presence or absence of hatchmarks along the line that cormects 
the endpoints). We were interested in learning about how raters used these design 
features and which features, if any, affected interrater reliability. 

The descriptive graphic rating format seems particularly attractive for 
assessment in the arts because it emphasizes the continuous nature of many of the 
performance criteria in these fields. While some arts-related performance criteria 
can appropriately be defined in terms of discrete categories, a number of criteria 
central to these domains carmot. Additionally, as Popham (1990) points out, this 
format takes advantage of the fact that "many people can use visual images to help 
them make qualitative gradations in their ratings" (p. 297). Because the raters in 
this study were visual artists accustomed to representing images visually, we felt it 
was appropriate to try out this rating format with them. 

Our research set out to provide answers to a number of questions we posed 
about descriptive graphic rating scales. Through our experimentation we hoped to 
leam about how raters employ these scales to make judgments about student work. 
We planned to use what we learned to help NAEP program personnel decide 
whether scales in this format might be used to score some of the production tasks 
included in the 1997 NAEP visual arts operational assessment. The specific 
questions we posed are listed below: 

• How many categories do the descriptive graphic rating scales we designed 
support? How should we think about them? as 3-point scales? as 4-point 
scales? as 5-point scales? etc. 




6 



Page 6 



• What is the effect of the number of categories on reliability of ratings for these 
scales? Does reliability cease to increase and instead begin to decrease at a 
certain point? Is there a point beyond which there is little utility to be gained 
in adding scale categories? 

• How do various design features of these scales affect reliability? Do raters use 
descriptive graphic rating scales with defined midpoints any more reliably 
than scales without defined midpoints? Do raters use descriptive graphic 
ratings scales with hatchmarks any more reliably than scales without 
hatchmarks? Are 5 hatchmarks any better than 3? 

• Will raters use descriptive graphic rating scales reliably if in their training 
they are only shown and discuss examples of students' work (i.e., anchors) for 
the endpoints of the scale but not for the midpoint? If they see and talk about 
anchors that fall at both ends and in the middle, will they produce more 
reliable ratings? If they are shown anchors that fall at various points along 
the full continuum, will they produce even more reliable ratings? What's the 
"bare minimum" raters need in the way of training anchors in order to use 
these descriptive graphic rating scales reliably? 

Method 



Participants 



Raters. Eleven of the raters who scored the grades 4 and 8 NAEP visual arts 
field test participated in this study. Eight were white females, one was a black 
female, and two were white males. Five raters had master's degrees in art, one had 
a bachelor's degree in art, and five had degrees in fields other than visual arts. Two 
had experience teaching art in grades K-6, and three had college-level art teaching 
experience. All were practicing artists whose own work covered a variety of arts 
specialties (i.e., painting, drawing, sculpture, photography, computer art, video, 
filmmaking, design, printmaking, fiber art, mural design and execution, collage). 
None of the raters had any previous experience using descriptive graphic rating 
scales to evaluate students' works of art. While all the raters had taken part in the 
scoring of the field test, none of them had scored the blocks of art-making activities 
included in this study. 

Trainers. Two persons participated in the study as trainers of the raters. One 
was a white female, and one was a white male. Both had previously served as 
trainers during the scoring of the NAEP visual arts field test for grades 4 and 8. Both 




Page 7 



had advanced degrees in visual arts. One had experience teaching art in K-12 
settings, and both had college-level art teaching experience. Neither of the trainers 
had any previous experience training raters to use descriptive graphic rating scales 
to evaluate students' works of art. 



Procedure 



Development of Descriptive Graphic Rating Scales. Part II of the NAEP 
visual arts field test included several production blocks. A production block 
contains multiple related exercises. The exercises frequently make use of the same 
stimulus material, and a combination of exercise formats are employed. When 
devising production blocks, efforts were made to integrate three artistic processes 
(i.e., creating, performing and responding) within a block. The exercises contained 
in a block are designed to engage students in activities typical of these three artistic 
processes. We selected four of these blocks to focus on in our study (Blocks 
RlVAXl, R123VAX6, R23VAX5, and R23VAX7). The specific art-making activities 
upon which we concentrated our scale development efforts are described below: 

• Block RlVAXl: Students selected a type of animal and then drew a 
comfortable place (an environment) for the animal. They were directed to 
make use of near and far shapes (i.e., perspective) and shapes that overlap 
when drawing their animal's place. They were also instructed to use the 
space and shape of their drawing paper in ways that were best for depicting 
the animal's place. 

• Block R123VAX6: Students drew an idea for a mural to show something that 
was important to the people in their community. They were instructed to use 
shapes, lines, colors, and forms that would capture the attention of people 
from far away. As they worked on their design, they were asked to think 
about how they were using the drawing space. 

• Block R23VAX5: Students created a self-portrait. They were instructed to use 
materials in a way that would communicate to a viewer something that they 
thought was important about their personality. 

• Block R23VAX7: Students read an ancient legend and then experimented 
with ways to visually express figures in the legend. They were asked to 
creatively combine the figures into a complete drawing, showing how they 
might interact. Students were reminded to choose media (drawing tools) 
from their materials packet that would best help them express their ideas 
most effectively. 




8 



Page 8 



We designed descriptive graphic rating scales to evaluate works of art 
students created in these blocks. (See Appendix A for copies of the scales we 
constructed.) The individual scales exhibited different combinations of design 
features. For some of the scales, we defined two endpoints of the scale and a 
midpoint; for other scales, we defined only the two endpoints. For some of the 
scales we constructed a horizontal line to connect the endpoints and then placed 
either three hatchmarks or five hatchmarks at specific points along the line to show 
key transition points along the continuum. Other scales had no hatchmarks along 
the horizontal line. 

Selection of Student Work. We selected samples of works of art that students 
created for these four production blocks during the field test. For each of the blocks, 
we used the ratings given during the scoring of the field test to assist us in selecting 
50 pieces of student work that would represent the full range of student ability 
exhibited. (In the scoring of the field test, raters used 3-point scales to evaluate these 
works. When pulling the 50 samples of student work for the study, we included 
some samples that received all 3 s, some that received all 2's, some that received all 
I's, and some that received mixtures of 3's, 2's, and I's.) The trainers selected 
additional samples of student work to serve as anchors for rater training purposes 
and to include in sets for raters to use as practice. 

Rater Training. During each training session, the trainer introduced the 
raters to the two scales the raters would be using during that session. The trainer 
defined each of the performance criteria, and then raters examined and discussed 
samples of student work in order to clarify the meaning of each criterion and the 
distinctions between the various points on the scale. The trainer presented 
examples of students works (i.e., anchors) and talked about the characteristics of each 
that should be considered when assigning a rating. After introducing each work and 
talking about its characteristics, the trainer would fasten the work to the wall 
showing its position along the linear continuum. ^ 



In some ways, the scales we devised might be thought of as product scales (Linn & Gronlund, 1995) since we 
purposely chose examples of students' work that represented various levels of quality and then visually displayed 
them so that raters could develop a sense of the continuum of quality they wereTikely to see when they carried out the 
actual scormg. Indeed, durmg the scoring sessions, if a rater had difficulty deciding where to place his or her slash 
along the horizontal Ime, the rater would frequently bring the work to the wall and compare it to the anchors that 
were displayed to determine where the student's work seemed to best "fit " along the continuum. 




9 



V 



Page 9 



In this Study, we experimented with several approaches to using anchors 
during training. In one training session, the trainer showed and discussed anchors 
for five points along the continuum (i.e., the two endpoints and three points in 
between). For other training sessions, the trainer showed and discussed anchors for 
both endpoints of each scale and for the midpoint. In still other training sessions, 
fbe trainer showed and discussed anchors for only the endpoints. We wanted to 
know whether raters needed to see and talk about anchors along the full continuum 
in order to use the scales reliably, or whether they could score reliably after having 
seen anchors for two endpoints and a midpoint, or for endpoints only. 

After the trainers introduced and discussed the anchors, time was set aside for 
raters to practice scoring samples of student work. The raters would independently 
score small sets of five student works and then, as a group, discuss the ratings they 
gave. Using a flip chart, the trainer would draw five horizontal lines and ask each 
rater to come forward and indicate where along each line he or she had placed each 
student s work. After all the raters had taken turns making their slash marks, the 
group discussed their ratings in order to clarify meanings of each of the performance 
criteria and to attempt to reach consensus in their usage of the rating scales. 

Scoring the Student Work. The scoring took place over a two-day period. 
The experimental design we employed is shown in Figures 1 and 2. We randomly 



Insert Figures 1 and 2 about here 



assigned the eleven raters to two groups. Group 1 met with Trainer 1 and 
completed training to score Block R123VAX6/AM. The raters then scored the 50 
students' works selected for that block. Concurrently, Group 2 met with Trainer 2, 
completed training to score Block RlVAXl/AM, and then scored the 50 students’ 
works selected for that block. Following a lionch break. Group 1 met with Trainer 2 
and were trained to score Block RlVAXl/PM. The raters scored the same set of 50 
students' works that Group 2 had scored in the morning, but the scales they used 
had different design features than the scales Group 2 had used (see Figure 1 for a 
description of those design features). Group 2 met with Trainer 1 to learn to score 





Block R123VAX6/PM.2 Group 2 scored the same set of 50 students' works that 
Group 1 had scored in the morning, but the scales they used had different design 
features than the scales Group 1 had used. 

On the second day, we again randomly assigned the eleven raters to two new 
groups. Group 1 met with Trainer 1 and completed training to score Block 
R23VAX5/AM. The raters then scored the 50 students' works selected for that block. 
Concurrently, Group 2 met with Trainer 2, completed training to score Block 
^3VAX7/AM, and then scored the 50 students' works selected for that block. 
Following a lunch break. Group 1 met with Trainer 2 and were trained to score 
Block R23VAX7/PM. The raters scored the same set of 50 students' works that 
Group 2 had scored in the morning, but the scales they used had different design 
features than the scales Group 2 had used (see Figure 2 for a description of those 
design features). Group 2 met with Trainer 1 to learn to score Block R23VAX5/PM. 
Group 2 scored the same set of 50 students' works that Group 1 had scored in the 
morning, but the scales they used had different design features than the scales 
Group 1 had used. 



At the end of the second day, we gave each rater a questionnaire to fill out to 
gather their reactions to using the experimental rating scales. We provided them 
with postage-paid envelopes and asked them to return the completed 
questionnaires within a week. 



Data Analysis 



To analyze the rating data from this study, we employed Facets (Linacre, 
1994a), a Rasch-based rating scale analysis computer software program. Facets is a 
generalization of Wright and Masters' (1982) Partial Credit model which makes 
possible the analysis of data from assessments that have more than the traditional 
two "facets" associated with multiple-choice tests (i.e., "items" and "examinees"). 



In the many-facet Rasch model (Linacre, 1994b), each "element" of each facet 
of the assessment situation (e.g., each student, rater, rating scale category, etc.) is 
represented by one parameter. In this study, the model contains a parameter 



2 

Note that each of ^e four blocks was scored twice-once in the morning, and once in the afternoon (by a different set 
of raters). Hence, the designation "AM" or "PM" appearing after each block. ^ uuierem sei 






Page 11 

representing student "ability," a second parameter representing rater "severity," and 
a third parameter representing rating scale category "challenge." Facets has the form 
of a log-linear model for main effects, and estimates those effects in "logits," or the 
logarithms of odds of a given rating compared to the next lower one. For our study, 
the model takes the following particular form: the log-odds of the probability that a 
student with a true ' ability of 6 will receive from Rater j a rating in Category k 
[denoted Py,i(0)] as opposed to receiving a rating in Category k-l [denoted Pyi.,(0)] 

on a rating scale with k categories is modeled as 



MP/..(e)/p,...,(e)]=e-l,-T.. (i) 

where is the "severity" parameter associated with Rater j, and for k=2,...,K is a 
parameter indicating the relative probability of a rating in Category k as opposed to 
Category k-l for the scale when t, = 0. It follows that the probability of a rating in 
category k for a student with parameter d from Rater j is 



Pm(«) = T 



exp 


^=1 


K 

2 exp 
/=! 





for k = 1,K. 



( 2 ) 



When raters evaluated students' work, they were instructed to place a vertical 
slash along an 8-1/2" horizontal line to indicate where along that continuum they 
felt the students' work fell. To prepare these data for analysis, we measured from 
the left end of each line to the point where the rater's slash crossed that line, 
rounding to the nearest 1/16". We converted that number to its decimal equivalent 
and then transformed this 0 to 8.5 scale to a scale that ran from 0 to 255 (i.e., the 
maximum number of rating scale categories Facets can accommodate is 255). 

For each block, we ran a series of eight Facets analyses. For example, for Block 
RlVAXl/AM we first analyzed the data as if the two scales were 3-point scales and 
then examined how the rating scales functioned. (For this analysis, we divided the 
horizontal line (i.e., the 0 to 255 scale) into three equal segments: ratings from 1-85 
were recoded as "1," ratings from 86-170 were recoded as "2," and ratings from 171- 
255 were recoded as '3. ") Using Facets recoding capabilities, we then ran additional 




12 



Page 12 



analyses to see how the scales would function if we were to consider the scales as 4- 
point scales (i.e., dividing the 0 to 255 horizontal scale into four equal segments), as 
5-pomt scales, as 6-point scales, as 7-point scales, as 8-point scales, as 9-point scales, 
and, finally, as 10-point scales. ^ By comparing the output from the various analyses, 
we sought to determine what the optimum rating scale structure was for each scale 
from the standpoint of measurement precision and scale discriminability. 

In addition to the Facets analyses, we also ran intraclass correlational analyses 
(Berk, 1979; Cherry & Meyer, 1993; Cronbach, Ikeda, & Avener, 1964; Ebel, 1951; 
Shrout & Fleiss, 1979) so that we could compare the Rasch student separation 
reliabilities reported as part of the Facets output to conventional intraclass 
correlations. Based on analysis of variance procedures, intraclass correlation 
expresses the "classical theory of measurement error relationship between true and 
observed variance" (Berk, p. 463). In our study, all raters rated all students included 
in each block. Therefore, we used the following formula for fully crossed designs as 
recommended by Cherry and Meyer (1993) to calculate intraclass correlation:^ 

MSp - MS. 

^ MSp + (k-l)MSe 

where MSp is the between-persons mean square, MSg is the error mean square, and 
k is the number of raters. 



^ alternative strate^ for recoding ratings involves defining the rating scale categories such that the categories 
have nearly equal numbers of ratings (i.e., counts) in each (J. M. Linacre, personal communication, Nov. 5 1995). For 
example, if 6 raters each rated 50 pieces of student work on a single descriptive graphic rating scale, 300 ratines would 
be generated. Suppose we wanted to analyze these ratings to see how the scale would function as a 3-point scale using 

between 1 and 115, we would recode eacfi of these ratines 
to 1. If me next IW ratings fell between 116 and 145, we would recode these ratings as "2." If the last 100 ratings fdl 
be^een 146 and ^5, we would recode these ratings as "3." Using this recoding strategy, we would now have a ^ 
p^t scale with the t^ee categories each containing an equal numbers of ratings. Note how the equal counts recoding 
strategy differs from the r^oding stategy we used m this study (i.e., defining the rating scale categories by dividing the 
horizontal Im^-the 0 to 255 scale--mto equal segments rather than dividing S\e total number of ratings given into%ual 
counts. Initially, we ran a senes of Facets analyses using the equal counts recoding strategy and another set of Facets 
an^yses on the same data (i.e., 4 of the 8 blocks) using the equal segments recoding strategy so that we could compare 
output from both sets of analyses. In each case, there was very little difference between Sie two sets of output when w 
exammed key mdicators (i.e., student separation, rater separation, interrater reliability attenuated by rater variance) 
r^o^ing^trateg^ include in this report only findings from the analyses in whicn we used the equal segments 



we 



^This formida is based on a two-way mixed effects ANOVA having two independent variables— "raters" (which is 
freated as a fixed effect) and "students" (which is treated as a random effect). Cherry and Meyer (1993) describe how 
between-rater variance is treated when computing intraclass correlation using this formula: ^'The difference in 
average scores betw^n ^o raters (or among three or more raters) is attributed to raters consistently applying slightly 
different standards of judgment rather than to error, and the difference in average scores is therefore considered true 
variance ' (p. 133). ° 




Page 13 



Results 

Our study sought to answer four related sets of questions. We structured our 
discussion of research findings around the specific questions we explored with the 
Facets output. 



• How many categories do the descriptive graphic rating scales we designed 
support? How should we think about them? as 3-point scales? as 4-point 
scales? as 5-point scales? etc. 

Facets provides several pieces of output that can help us answer these 
questions. For each scale. Facets reports the percentage of ratings that fall into each 
category which facilitates examination of category usage by raters. Beyond this. 
Facets reports the "Average Measure Difference" (AMD) for each category on a scale. 
Linacre (1994b) defines average measure difference as "the average of the [student] 
measures [of ability] that are modeled to generate the observations in this category" 
(p. 69). As we move from categories at the lower end of a scale to categories at the 
higher end of a scale, we would hope to see a pattern of ascending AMD's (Linacre, 
1995). When we see evidence of this kind of pattern occurring, it suggests that the 
rating scale categories are appropriately ordered and are functioning properly. 

Higher ratings do correspond to "more" of the variable being rated. If AMD's do not 
increase (i.e., if we see identical values for adjacent categories or one or more 
descending values), then that suggests that some of the categories are not 
functioning as intended. (For example, while we may have intended for a scale we 
designed to function as a 5-point scale, the raters may instead be using it as a 3- or 4- 
point scale. Raters may find that some of the categories are not clearly differentiated 
from one another.) 

Facets provides an additional check on category ordering. For each category. 
Facets reports the lowest [student ability] measure at which this category is the one 
most probable to be observed " (Linacre, 1994b). Facets identifies those categories that 
are never most probable to be observed for any student ability measure. Like the 
AMD's, these "Most Probable" Thresholds (MPT's) should also be ordered, 
increasing as we move from categories at the lower end of a scale to categories at the 
higher end of a scale. If a category is never most probable to be observed, then that 
suggests that there are problems with the rating scale categories (i.e., some categories 




Page 14 



are not distinguishable and are underutilized) and may signal a need to reduce the 
number of categories by combining some of them. As Andrich (1996) notes, a scale 
may show ascending AMD's but not ascending MPT's. 

In Tables 1 through 8 we report both AMD’s and MPT's for the descriptive 
graphic scales used in this study. For each scale, we show how many ascending 
values appeared in the output for the AMD's and for the MPT's. When we look 
across values reported for Scales 1 and 2 in the rows labeled "Most Probable From" 
and Average Measure Difference" within a table, we can get a sense of how many 
categories each of the two descriptive graphic rating scales in that block could 
support. For example, in Table 1 we note that when we analyzed Scale 1 as a 3-point 
scale, the three categories on that scale had ascending AMD's and ascending MPT's. 
It would be appropriate, then, to think of Scale 1 as a 3-point scale. As we look across 
the Scale 1 "Most Probable from" row and the Scale 1 "Average Measure Difference" 
row, we see that Scale 1 would also support an interpretation of it as a 4-point, 5- 
point, 6-point, or 7-point scale. In each case, all the AMD’s and MPT's are ascending 
for Scale 1. We see, though, that when we analyzed Scale 1 as a 8-point scale, the 8 
categories on that scale had ascending MPT's, but only 7 categories had ascending 
AMD's. These findings would cast some doubt on whether Scale 1 would support 
an 8-point interpretation. When we analyzed Scale 1 as a 9-point scale, only 8 
categories had ascending MPT’s. It appears, then, that if we use "Average Measure 
Differences" as the decision-making criterion, the number of scale points that Scale 1 
would support is 7. By contrast, if we were to use the "Most Probable" Thresholds as 
the decision-making criterion, the number of scale points that Scale 1 would support 
is 8. 



Insert Tables 1 to 8 about here 



What is the number of scale points that our scales could support? When we 
examine summary Table 9, we see that if we were to use "Most Probable" 
Thresholds as our decision-making criterion, then we would conclude that all the 
scales could support at least a 5-point interpretation, while some of the scales could 
be thought of as supporting as many as 7 or 8 points. However, if we were to use 
"Average Measure Differences" as our decision-making criterion, then we would 




Page 15 



conclude that all the scales could support at least a 7-point interpretation, while 
some of the scales could be thought of as supporting as many as 9 or 10 points. 



Insert Table 9 about here 



• What is the effect of the number of categories on reliability of ratings for these 
scales? Does reliability cease to increase and instead begin to decrease at a 
certain point? Is there a point beyond which there is little utility to be gained 
in adding scale categories? 

To answer these questions, we focus on student separation reliabilities and 
intraclass correlation coefficients contained in Tables 1 through 8. In Rasch terms, 
student separation is a measure of the spread of the estimates of student ability 
relative to their precision (Linacre, 1994b). The student separation index indicates 
the number of statistically different strata of student ability in the sample of students 
evaluated by the rating scales (Wright, 1996). Student separation has a range of 0 to 
°o. When we look across Tables 1 to 8, it appears that we could consistently identify 
between 2 and 4 student strata, depending upon the number of points on the scales 
used. Generally, as the number of rating scale points increases, student separation 
increases. However, for each block, there is a certain point at which the amount of 
increase levels off, or, in some cases, actually begins to decrease. 

While Facets does not provide a measure of interrater reliability per se, it does 
include a separation reliability which, like interrater reliability, has a range of 0 to 1. 
The Rasch student separation reliability is the ratio of "true" variance in student 
scores to the "observed" variance in student scores. As Wright explains (1996), "In 
Rasch terms, 'true' variance is the 'adjusted' variance (observed variance adjusted 
for measurement error). Error variance is a mean-square error (derived from the 
model) inflated by misfit to the model encountered in the data" (p. 472). The 
student separation reliabilities we report in Tables 1 through 8 have been attenuated 
for rater variance (Linacre, 1991). As Linacre (1991) notes, there is usually little 
difference between Rasch student separation reliabilities and interrater reliabilities 
when equivalent variance terms are used to compute them. Student separation 
reliabilities for the eight blocks are generally in the range of .70 to .90. As the 
number of rating scale points increases, separation reliability increases; but for each 





Page 16 



block, we reach a point of diminishing returns (i.e., the reliability ceases to increase, 
or, in some cases, actually decreases). In some blocks, that leveling off or decrease 
tends to occur as we move from 5-point scales to 6-point scales (for blocks 
RlVAXl/AM, RlVAXl/PM, R123VAX6/AM, R23VAX7/AM, R23VAX7/PM), 
while for other blocks this occurs as we move from 7-point scales to 8-point scales 
(for blocks R123VAX6/PM, R23VAX5/AM, and R23VAX5/PM). 

To facilitate comparison of Rasch student separation reliability with a more 
traditional measure of interrater reliability, we calculated intraclass correlation 
coefficients. For most of the blocks, the intraclass correlations tend to be somewhat 
lower than the comparable student separation reliabilities (although for three of the 
bIocks-R23VAX7/PM, R23VAX5/AM, and R23VAX5/PM-the intraclass 
correlations are actually somewhat higher than the separation reliabilities). For all 
the blocks, the intraclass correlations are generally in the range of .65 to .90. As the 
number of rating scale points increases, intraclass correlation increases. However, 
we reach a point of diminishing returns for each block (i.e., the correlations cease to 
increase, or, in some cases, begin to decrease), just as occurred with the student 
separation reliabilities. That leveling off (or decrease) tends to occur as we move 
from 4-point scales to 5-point scales for block R23VAX7/PM; from 5-point scales to 6- 
point scales for blocks R123VAX6/PM, R23VAX7/AM, and R23VAX5/AM; from 6- 
point scales to 7-point scales for blocks R123VAX6/AM and R23VAX5/PM; and from 
7-point scales to 8-point scales for blocks RlVAXl/AM and RlVAXl/PM. 

To summarize, it appears that for both student separation and intraclass 
correlation little appreciable gain in reliability occurs if we think about these scales 
as having more than 5 points. In general, moving from 3-point scales to 5-point 
scales results in a useful gain in reliability (i.e., ranging from a gain of .03 to .10 for 
student separation, and a gain of .03 to .14 for intraclass correlation). However, 
moving from 5-point scales to 10-point scales nets, at best, a .03 gain for student 
separation (for blocks R123VAX6/PM and R23VAX7/PM) and a .04 gain for 
intraclass correlation (for block R23VAX7/PM). (More often, as we move from 
thinking about these as 5-point scales to thinking about them as 10-point scales, the 
gain in reliability is on the order of .01 to .02 for both these indices.) 

• How do various design features of these scales affect reliability? Do raters use 
descriptive graphic rating scales with defined midpoints any more reliably 





Page 17 



than scales without defined midpoints? Do raters use descriptive graphic 
ratings scales with hatchmarks any more reliably than scales without 
hatchmarks? Are 5 hatchmarks any better than 3? 

To answer these questions, we compared the student separation reliabilities 
(Table 10) and the intraclass correlation coefficients (Table 11) for the eight blocks for 
5-point scales. (We used the 5-point scale data since all of the scales included in this 
study could support a 5-point interpretation and, for the most part, there seemed to 
be little appreciable gain in reliability beyond 5 points.) In each table, we ordered the 
indices (i.e., the separation reliability coefficients and the intraclass coefficients) from 
high to low. For each block we included information about the design features of 
the scales in that block (i.e., presence or absence of a defined midpoint and number 
of hatchmarks). 



Insert Tables 10 and 11 about here 



The student separation reliabilities reported in Table 11 range from .79 to .91. 
Whether or not a scale had a defined midpoint did not seem to affect separation 
reliability. When we examine the blocks with the highest separation reliabilities, 
we see that some of the blocks had a defined midpoint (i.e., RlVAXl/PM), while 
others did not (i.e., R123VAX6/PM). Similarly, when we examine the blocks with 
the lowest separation reliabilities, we see that some of the blocks had a defined 
jnidpoint (i.e., R23VAX7/PM), while others did not (i.e., R23VAX5/AM). Also, 
there does not appear to be a consistent relationship between number of hatchmarks 
and separation reliability. The blocks with the highest separation reliabilities 
contain no hatchmarks (i.e., RlVAXl/PM and R123VAX6/PM), but the block with 
the lowest separation reliability also contained no hatchmarks (i.e., R23VAX7/PM). 

The intraclass correlation coefficients reported in Table 11 range from .75 to 
.91. Whether or not a scale had a defined midpoint did not seem to affect intraclass 
correlation. When we examine the blocks with the highest intraclass correlations, 
we note that some of the blocks had a defined midpoint (i.e., R23VAX5/PM), while 
others did not (i.e., R23VAX5/ AM). Similarly, when we examine the blocks with 
the lowest intraclass correlations, we see that some of the blocks had a defined 
midpoint (R123VAX6/AM), while others did not (i.e., RlVAXl/AM and 




Page 18 



R123VAX6/PM). Additionally, it does not appear that number of hatchmarks affects 
intraclass correlation. When we examine the two blocks containing scales with 5 
hatchmarks, we see that one of the blocks had the highest intraclass correlation (i.e., 

R23VAX5/PM), while the other block had one of the lowest intraclass correlations ' 
(i.e., RlVAXl/AM). 

• Will raters use descriptive graphic rating scales reliably if in their training 
they are only shown and discuss examples of students’ work (i.e., anchors) for 
the endpoints of the scale but not for the midpoint? If they see and talk about 
anchors that fall at both ends and in the middle, will they produce more 
reliable ratings? If they are shown anchors that fall at various points along 
the full continuum, will they produce even more reliable ratings? What's the 
bare minimum raters need in the way of training anchors in order to use 
these descriptive graphic rating scales reliably? 

To answer these questions we refer to Tables 10 and 11. Each table includes 
information about the anchors and practice sets used in training for each block. 

When we review these tables, we see little evidence of a consistent relationship 
between the nature of the anchors used in training and reliability. 



Table 10 reveals that some of the blocks with the highest separation 
reliabilities had anchors showing endpoints and a midpoint (i.e., R1 VAXl /PM), 
while other blocks had anchors showing endpoints only (i.e., R123VAX6/PM and 
Similarly, when we examine the blocks with the lowest separation 
reliabilities, we see that some of the blocks had anchors showing endpoints and a 
.midpoint (i.e., R23VAX7/PM), while other blocks had anchors showing endpoints 
only (i.e., R23VAX5/ AM). The block that had anchors showing all five scale points 
(i.e., R23VAX5/PM) had neither the highest nor the lowest separation reliability. 



We see much the same story when we review the information in Table 11, 
with one difference: the block that had anchors showing all five scale points (i.e., 
R23VAX5/PM) had the highest intraclass correlation (.91). The three blocks having 
anchors that showed endpoints and a midpoint (i.e., R1 VAXl /PM, R23VAX7/PM 
and R123VAX6/AM) had intraclass correlations in the range of .80 to .85. We see 
somewhat greater variation across the blocks having anchors that showed only 
endpoints. Some of these blocks (i.e., R23VAX5/AM and R23VAX7/AM) had 
intraclass correlations in the range of .87 to .91, while other blocks (i.e.. 



O 

ERIC 



Page 19 



RlVAXl/ AM and R123VAX6/PM) had lower intraclass correlations in the range of 
.75 to .79. 

Finally, it is interesting to note that the one block that had practice sets 
showing examples of student work for endpoints only (i.e., R123VAX6/PM) had the 
lowest intraclass correlation (.75) but one of the highest separation reliabilities (.90). 

Discussion 

Can trained raters use descriptive graphic rating scales to evaluate students' 
works of art? Based on findings from our study, we conclude that the raters were 
able to reliably use the scales we constructed. We found that all the scales would 
support at least a 5-point interpretation, and that individually some of the scales 
could be thought of as supporting as many as 7 to 10 points. These findings lend 
support to the suggestion made by Cronbach et al. (1994) that raters be encouraged to 
use midpoints between defined categories to improve the accuracy of the rating 
process, thus reducing a source of measurement error. It appears that the raters in 
this study were able to make finer distinctions than a traditional 3- or 4-point 
scoring rubric allows. However, it's important to note that we found little 
appreciable gain in reliability for scales having more than 5 points, confirming the 
findings of Bendig (1952a, 1952b, 1953, 1954a, 1954b), Lissitz and Green (1975), and 
Jenkins and Taber (1977). In general, moving from 3-point to 5-point scales resulted 
in a useful gain in reliability, but the net gain in reliability associated with moving 
from 5-point to 10-point scales was minimal. 

When we designed the descriptive graphic rating scales for this study, we 
varied certain design features of the scales.' The individual scales exhibited different 
combinations of the design features. For some of the scales, we defined two 
endpoints of the scale and a midpoint; for other scales, we defined only the two 
endpoints. For some of the scales we constructed a horizontal line to connect the 
endpoints and then placed either three or five hatchmarks at specific points along 
the line to show key transition points along the continuum. Other scales had no 
hatchmarks along the horizontal line. We wanted to determine whether these two 
design features (i.e., presence or absence of a defined midpoint, number of 
hatchmarks) affected interrater reliability. We looked at two measures of rater 
reliability: Rasch student separation reliabilities and intraclass correlations. We 




20 



Page 20 



computed these measures for our scales, considering the scales as 5-point scales. The 
student separation reliability coefficients for the 5-point scales ranged from .79 to .91. 
The intraclass correlation coefficients for these same scales ranged from .75 to .91. 
Whether or not a scale had a defined midpoint did not affect student separation or 
intraclass correlation. Similarly, whether the scale had 0, 3, or 5 hatchmarks did not 
affect these measures. These findings lend credence to the views of Guion (1986) 
and Cronbach (1990) who contend that the particular features of a scale are not as 
important as the knowledge, skills, and motivation of the rater. 

We experimented with several approaches to using anchors during training. 

In one training session, the trainer showed and discussed anchors for five points 
along the continuum (i.e., the two endpoints and three points in between) and then 
provided practice sets for the rater to use that contained students' works covering 
the full continuum. For other training sessions, the trainer showed and discussed 
anchors for both endpoints of each scale and for the midpoint, and then raters 
practiced scoring works that covered the full continuum. In still other tr ain ing 
sessions, the trainer showed and discussed anchors for only the endpoints, and the 
raters then practiced scoring works covering the full continuum. We wanted to 
know whether raters needed to see and talk about anchors along the full continuum 
in order to use the scales reliably, or whether they could score reliably after having 
seen anchors for two endpoints and a midpoint, or for endpoints only. Our findings 
would suggest that raters can use descriptive graphic rating scales reliably if they see 
and talk about anchors for both endpoints of the scale but then have some practice 
rating examples of students works that cover the full continuum of student ability. 

If NAEP program personnel were to decide to include some scales in the 
descriptive graphic format to score some of the production tasks included in the 
1997 NAEP visual arts assessment, then there are some operational concerns that 
will need to be addressed. As a first step, we would need to consider ways to 
streamline the process of translating a rater s slash on a line into a score. For this 
study, we manually carried out the various steps in this process, measuring from 
the left end of each line to the point whether the rater's slash crossed that line, and 
then rounding to the nearest 1/16". After we converted that number to its decimal 
equivalent, we transformed the 0 to 8.5 scale to a scale that ran from 0 to 255. This 
was a laborious and time-consuming process that, perhaps, could be automated. We 
could investigate the feasibility of transferring the rating scales to a computer, 

21 

er|c 



Page 21 



having the raters use a mouse to click at the point on the horizontal line where they 
judge a student's work to lie, and then having the computer convert that mark into 
a score. If we could use technology to automate the steps in this data preparation 
process, then that could result in considerable time and cost savings. 

Traditionally, when raters use scoring rubrics in NAEP, program personnel 
overseeing the scoring process consider ratings that are more than 2 points apart (or, 
in the case of 3-point rubrics, more than 1 point apart) discrepant, and a third rater is 
brought in to adjudicate the discrepancy. But what does "discrepancy" mean for 
raters using descriptive graphic rating scales? How far apart do two raters' slash 
marks on a horizontal line need to be in order to be considered discrepant? Suppose 
we were to establish a guideline for defining what we mean by discrepancy. Could 
we then program a computer to identify students' works that received discrepant 
ratings so that they could be set aside for third-rater adjudication? 

If we were to incorporate descriptive graphic rating scales into NAEP 
assessments, we would also need to work through a number of issues related to 
combining and reporting assessment results: 

• Can we combine results from students' performance on production tasks 
scored using descriptive graphic rating scales alongside results from students' 
performance on other types of exercises included in the NAEP visual arts 
assessment (i.e., short constructed response exercises, multiple-choice items, 
extended constructed response exercises, production tasks scored using 
traditional 3- and 4-point rubrics)? Is it psychometrically feasible to produce a 
single unidimensional scale that will encompass these diverse sources of 
information about students' performance? If we cannot produce a single 
unidimensional scale, is it feasible to produce several scales? 

• If we were to report narratively on student performance for production tasks 
scored using descriptive graphic rating scales, what type of reporting format 
should we use? (For NAEP assessments in other content areas, illustrative 
exercises are often presented as part of the final report, and the percentage of 
student responses falling into each category are reported. However, these 
illustrative exercises have typically been scored using rubrics containing 3 or 4 





Page 22 



discrete categories. How would we report on student performance if our 
scales do not contain discrete categories?) 

• How will the information we derive from scoring these production exercises 
using the descriptive graphic rating scales feed into achievement level 
reporting? Can we use this scoring information to help us define basic, 
proficient, and advanced achievement levels in the visual arts? 

Conclusions 

The descriptive graphic rating scale format seems to hold promise as a 
suitable format for scoring production tasks to be included in the 1997 NAEP visual 
arts assessment. The scales we piloted in this study seemed to work quite well and, 
according to the indicators we used, appear psychometrically sound. As next steps, 
there are issues related to combining and reporting assessment results that will need 
to be addressed before such scales could become part of a NAEP operational 
assessment. 



23 

o 

ERIC 



Page 23 



References 

Andrich, D. (1996). Category ordering and their utility. Rasch Measurement: 
Transactions of the Rasch Measurement SIG, 9(4), 464-465. 

Bendig, A. W. (1952a). A statistical report on a revision of the Miami instructor 
rating sheet. Journal of Educational Psychology, 43, 423-429. 

Bendig, A. W. (1952b). The use of student rating scales in the evaluation of 

instructors in introductory psychology. Journal of Educational Psychology, 43, 
167-175. 

Bendig, A. W. (1953). The reliability of self-ratings as a function of the amount of 
verbal anchoring and of the number of categories on the scale. Journal of 
Applied Psychology, 37, 38-41. 

Bendig, A. W. (1954a). Reliability and number of rating scale categories. Journal of 
Applied Psychology, 38, 38-40. 

Bendig, A. W. (1954b). Reliability of short rating scales and the heterogeneity of the 
rated stimuli. Journal of Applied Psychology, 38, 167-170. 

Berk, R. A. (1979). Generalizability of behavioral observations: A clarification of 
interobserver agreement and interobserver reliability. American Journal of 
Mental Deficiency, 83, 460-472. 

Cherry, R. D., & Meyer, P. R. (1993). Reliability issues in holistic assessment. In M. 
M. Williamson & B. A. Huot (Eds.), Validating holistic scoring for writing 
assessment (pp. 109-141). Cresskill, NJ: Hampton Press. 

. Cronbach, L. J. (1990). Essentials of psychological testing {5th ed.). New York: 

Harper & Row. 

Cronbach, L. J., Bradbum, N. M., & Horvitz, D. G. (1994, July). Sampling and 

statistical procedures used in the California Learning Assessment System. 
Report of the Select Committee. Palo Alto, CA: Author. 

Cronbach, L. J., Ikeda, M., & Avner, R. A. (1964). Intraclass correlation as an 

approximation to the coefficient of generalizability. Psychological Reports, 15, 
727-736. 

Ebel, R. L. (1951). Estimation of the reliability of ratings. Psychometrika, 16, 407-424. 

Finn, R. H. (1972). Effects of some variations in rating scale characteristics on the 
means and reliabilities of ratings. Educational and Psychological 
Measurement, 32, 255-265. 



O 

ERIC 



24 



Page 24 



Guion, R. M. (1986). Personnel evaluation. In R. A. Berk (Ed.), Performance 
assessment: Methods and applications (pp. 345-360). Baltimore: Johns 
Hopkins University Press. 

Herman, J. L., Aschbacher, P. R., & Winters, L. (1992). A practical guide to 

alternative assessment. Alexandria, VA: Association for Supervision and 
Curriculum Development. 

Jenkins, G. D., & Taber, T. A. (1977). A Monte Carlo study of factors affecting three 
indices of composite scale reliability. Journal of Applied Psychology, 62, 392- 
398. 

Tandy, F. J., & Farr, J. L. (1980). Performance rating. - Psychological Bulletin, 87, 72- 
107. 

Tandy, F. J., & Farr, J. T. (1983). The measurement of work performance: Methods, 
theory, and applications. San Diego: Academic Press, Inc. 

Tinacre, J. M. (1991). Inter-rater reliability. Rasch Measurement: Transactions of the 
Rasch Measurement SIG, 5(3), 166. 

Tinacre, J. M. (1994a). Facets [Computer program]. Chicago, IT: MESA Press. 

Tinacre, J. M. (1994b). A user's guide to Facets Rasch measurement computer 
program. Chicago, IT: MESA Press. 

Tinacre, J. M. (1995). Categorical misfit statistics. Rasch Measurement: Transactions 
of the Rasch Measurement SIG, 9(3), 450-451. 

Tinn, R. T., & Gronlund, N. E. (1995). Measurement and assessment in teaching 
(7th ed.). Englewood Cliffs, NJ: Merrill. 

Tissitz, R. W., & Green, S. B. (1975). Effect of the nuinber of scale points on 

reliability: A Monte Carlo approach. Journal of Applied Psychology, 60, 10-13. 

Mehrens, W. A., & Tehmarm, I. J. (1991). Measurement and evaluation in 
education and psychology. Chicago: Holt, Rinehart and Winston, Inc. 

Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits 
on our capacity for processing information. Psychological Review, 63, 81-97. 

Payne, D. A. (1992). Measuring and evaluating educational outcomes. New York: 
Macmillan. 

Popham, W. J. (1990). Modern educational measurement: A practitioner's 

perspective. Englewood Cliffs, NJ: Prentice Hall. 




25 



Page 25 



Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater 
reliability. Psychological Bulletin, 86, 420-428. 

Stiggins, R. J. (1987). Design and development of performance assessments. 
Educational Measurement: Issues and Practices, 6(3), 33-42. 

Wright, B. D. (1996). Reliability and separation. Rasch Measurement: Transactions 
of the Rasch Measurement SIG, 9(4), 472. 

Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago, IL: MESA 
Press. 



26 

er|c 



Figure 1. Experimental Design for Day 1. 




OJ 




Figure 2. Experimental Design for Day 2. 



O 



<u 

C 



D.LO 

I 

u 



Csi 



X 

< 

> 

m 

fN 



.S 

a,i 

■a (3 

S 6 

•S ^ 

^ c 
o 
c 






(/I rr 

c § 
'o Jii 

•S ^ f 



u« 

H 



(U 



5 

gi 

O t/3 

•S ■« 

t/i -4-* 

S 2 SI 

o <u 

• 6 -S 



K 

u 

2 

D 



VO 

D.LO 



i 

c 5 l 



cn 



(C 

cs: 



in 

X 

< 

> 

cn 

CM 

piS 



col 

-Si 

13 

u 

CD 



o 

:2 J 3 
S 6 

t^)73 "S 

— <u ■ 



(C 



c ^ 



<u 

c 



§• 

2 

c 5 l 



O 

1-H 

ON 

CO 



P!^ 



< 

i?5 

X 

< 

> 

cn 

CM 



.S 

O CO 
ro 

6 S 



u 

•S « 

tj -c 

T3 CO 
O 
C 






c 

O 

u 



bfl) &4 

c 



OJ 



73 £ 

§i 

O CO 

X CO 
CO -4^ 

^ 3{ 



O 

X 



o 

cd 



DC 

u 

2 

P 

hJ 



ON 

t-H 00 

§' cm" 

2 

U CO 



CQ 



s 

tC 

X 

< 

> 

cn 

CM 



<c 

CJ 

CD 



C6 



•|j2 

T3 2 



W1-T3 ^ 



O 

c 



i 



05 

c\j 



ERIC 



Traiping Training 

anchors show endpoints and midpoint anchors show all five scale points 

practice sets show full continuum practice sets show full continuum 



Table 1. Summary statistics for RlVAXl/AM 

(Each scale has no defined midpoint and five hatchmarks.) 
(Anchors show endpoints only; practice sets show fidl continuum) 



1 

10-point 

scales 


4.60 


06 


.82 


1 

Cn Cs 


o o 


9-poinl 

scales 


4,63 


00 


.82 


00 VO 


ov ov 


8-point 

scales 


4.57 


06* 


o 

00 


00 rs 


00 


7-point 

scales 


4.34 


.89 


00 


fN VO 




6-point 

scales 


00 


00 

00 


o 

00 


VO VO 


VO VO 


5-point 

scales 


4.28 


68* 


.79 


in in 


in in 


4-point 

scales 


4.20 


00 

00 


.77 




Tj* Tj* 


3-point 

scales 


CO 

CO 


.83 


.76 


cr> CO 


CO CO 




Student Separation 


Student Separation Reliability 


Intraclass Correlation 


Most rrobable rrom: 
Scale 1 
Scale 2 


Average Measure Difference: 
Scale 1 
Scale 2 







7==j 

CO 




Table 3. Summary statistics for R123VAX6/AM 
(Each scale has a defined midpoint and three hatchmarks.) 
(Anchors show endpoints and midpoint; practice sets show full continuum) 



CO 



lO'point 

scales 


3.59 


1 s 

1 


.82 


1 

VO tv 


00 ON 


9-poinl 

scales 


3.59 


06* 


.82 


vO 


00 OS 


8-point 

scales 


3.55 


06* 


1— • 
00 


tv in 


tv 00 


7-point 

scales 


o 

CO 


68* 


00 


in in 


tv tv 


6-point 

scales 


3.37 


00 


00 


in 


vO vO 


5-point 

scales 


ON 

CS 

CO 


00 

00 


o 

00 


in in 


in in 


4-point 

scales 


2.73 


.85 


.78 






3-point 

scales 


o 

tn 


.78 


vO 

tv 


CO CO 


CO CO 




Student Separation 


Student Separation Reliability 


Intraclass Correlation 


Most rroDaole rrom: 
Scale 1 
Scale 2 


Average Measure Uitterence: 
Sale 1 
Scale 2 





CO 

CO 




Tatle 5. Summary statistics for R23VAX7/AM 

(Each scale has no defined midpoint and no hatchmarks.) 
(Anchors show endpoints only; practice sets show full continuum) 



10-point 

scales 


I VC 

1 ^ 

1 


.89 


.89 


I C7S OC 


» 22 


9-poinl 

scales 


4.43 


.90 


00 

00 


00 00 


CTs CTS 


8-point 

scales 


4.34 


68* 


00 

00 


00 


00 00 


7-point 

scales 


4.26 


06* 


^s 

00 


^s VO 


CS. 


6-point 

scales 


4.07 


00 

00 


00 


vO vO 


vO vO 


5-point 

scales 


00 


00 

00 


.87 


in in 


in in 


4-point 

scales 


3.66 


.85 


.83 






3-point 

scales 


3.06 


.85 


.73 


CO CO 


CO CO 




Student Separation 


Student Separation Reliability 


Intraclass Correlation 


Most rroDaoie rrom; 
Scale 1 
Scale 2 


Averse Measure Uitrerence: 
&ale 1 
Scale 2 




CD 

oo 



in 

CO 




table 7. Summary statistics for R23VAX5/AM 
(Each scale has no defined midpoint and three hatchmarks.) 
(Anchors shoiu endpoints only; practice sets show full continuum) 



00 



I 















5 












o " 


00 


00 


CO 


ON m 


0 0 


t9 




00 


ON 


1-H 1-H 


1 












O (0 
tH 
























C 


ON 


00 


cs 










00 


ON 


00 


ON ON 




Tf 










* 

ON 
























.ti o» 


VC 


00 


cs 








0 


00 


ON 




00 00 




Tf 










00 
























c 

0 » 


ON 


00 


cs 










00 


ON 


NO 




04 ^ 


Tf 


































c 

.t: o» 


f— < 










2 « 


ON 


00 


ON 


vo in 


vO vC 




CO 










1 M 

V£> ” 
























C ^ 


00 


vC 








2 


00 


00 


ON 


in in 


in in 


& u 


CO* 










in ^ 
























e 

.zi o» 


r-. 


0 


ON 






2 


in 


00 


00 








CO 


































.s s 
2 ^ 


00 


77 


86 


CO CO 


coco 


Q« %j 


cs 










CO 
















> 


















• • 












8 






Id 






c 






.2 






£ 






DC 


c 

g 








c 


c 




e 


Q 




0 


.2 


(0 


0 


OJ 




(Q 


(0 






u» 

3 








Ui 


0» 


CA 




<0 


<0 


0 








a 


a 


U 


'S ^ 


^ f-t cs 




0» 

CO 


0> 

cn 


CA 








CA 


0 13 


fli (0 




c 

0) 

T3 


c 

i> 

T3 


<0 

(Q 


£cXc5^ 


b4 

1) 




3 


3 




0 


> 






C/5 




s 


< 




00 




Table 9. Number of Scale Points Supported by Each Scale 



o 



"Average Measure 
Differences" 


Ds O 


ON o 


CJv 


Os O 


o o 


Ov O 

rH 


o o 

rH rH 


o o 

rH rH 


"Most Probable From" 
(Thresholds) 


GO VO 


Cx Cx 




00 


GO VO 


00 00 


VO in 


VO VO 




RlVAXl/AM 
Scale 1 
Scale 2 


RlVAXl/PM 
Scale 1 
Scale 2 


R123VAX6/AM 
Scale 1 
Scale 2 


R23VAX6/PM 
Scale 1 
Scale 2 


R23VAX7/AM 
Scale 1 
* Scale 2 


R23VAX7/PM 
Scale 1 
Scale 2 


R23VAX5/AM 
Scale 1 
Scale 2 


R23VAX5/PM 
Scale 1 
Scale 2 



CO 




tf) 












































u 




ga 


£ 




^ £ 


£ 


£ 


i £ 


£ 


£ 


QQ 






3 


c 


3 


3 


3 


3 


. 3 


3 






C/3 .. 


3 


o 


3 


3 


3 


3 


3 


3 


X 






C 


Vi 


C 


C 


C 


C 


.£ 


C 


SO 




u O 
u o 


c 


c 


c 


C 


c 


c 


c 


C 


U4 






c 


’o 


0 


o 


o 


o 


o 


o 




w {fy 


u 


Q 


u 


u 


u 


u 


u 


u 


0» 




2 




TJ 


















CU 




C 

0) 












■2 
























O 
























































































o 












































o 






.£ 






.£ 




cn 




c 








o 






o 






'S 


X 






a 


> 


> 


3 


>• 


c 


> 


-S' 








3 


C 


c 




3 


O 


'c 


T3 


X 




(I) 

** 


£ 


0 


o 


§ 


0 


a 


0 


§ 


E 




2 ^ 


T3 


Vi 


cn 


T3 


cn 


3 


cn 


T3 




•fi o 


C 


C 


C 


C 


C 


(0 


C 


C 


2 




yx 

I'" 


A 

cn 


*6 

a 


’o 

a 


to 

jn 


'o 

a 


CJ 

cn 

o> 


'o 

a 


(0 

i2 






.s 


T3 


T3 


.s 


TJ 


> 


T3 


c 


01 






Ol 


C 


C; 


o 


c 


X 


C 


*o 


krt 






Ou 


0» 


0» 


a 


<yi 






a 


•TS 

u 






T3 

c 






T3 

c 




'cS 




T3 

C 


3 






0> 






a; 
































(0 






















e 






















4> 


Wi 


(0 


















*c 

VM 


*§ 


X 

u 




















V) 


(Q 


















01 

o 

U 


*5 


#of 

chm 


o 


o 


in 


cn 


o 


in 


cn 


o 


>s 


a. 

1 


m 




















LTi 


X 




















Vo 




















X 


vS, 




















•S 






















"33 


























•W c 


















e 




(U •- 


















o 

2 




•Sg. 


yes 


no 


no 


_Z£s 


no 


yes 


no 


yes 


(Q 

CU 




QS 


















01 






















C/3 












































e 






















01 




e >^ 


















3 




o .tj 






















C *X "2 


















tn 








o 


ON 


iao 


00 


00 


vO 


ON 








ON 


ON 


00 


00 


00 


00 


00 




o 




3 J.2 

** 0<7I 


















(A 




C/3 « ^ 


















e 

o 




C/3 X 


















Vi 












































(Q 






















Table 10. Comp 




Block 


RlVAXl/PM 


R123VAX6/PM 


RlVAXl/AM 


R123VAX6/AM 


R23VAX7/AM 


R23VAX5/PM 


R23VAX5/AM 


R23VAX7/PM 




V) 

X 




to 


E 


E 


r« 

c 


E 


E 


E 


E 








3 


3 


3 


3 


3 


3 


3 


*3 


u 




VD .. 


3 


3 


3 


3 


3 


3 


3 


O 


O 




01 ? 


.5 


.£ 


.£ 


_C 


.£ 


.£ 


.3 


jO 






u o 

j3 


s 


c 


c 


C 


c 


3 


3 


3 






W c/3 


o 


p 


0 


0 


o 


O 


o 


'o 


j; 




u 




u 


u 


u 


u 


c-> 


a 


U) 




u 


s 


s 


s 


zs 




— 




T3 


u 




Cn 


v2 


v5 


v5 




v2 


v2 


v2 


3 

0) 
























X 


































































o 
























































c 


3 


3 












CO 






o 


'o 


O 






o 

o 




CO 

fca •• 


£ 

o 


>■ 

"c 


■£ 


'P 


a 

TD 

*p 


a 

T3 

*p 


*3 


*3 






o> 

3 


o 


o 


£ 


£ 


£ 


O 


o 


JZ 




2 S 


CO 


CO 


*T3 


TD 




CO 


CO 


to 




J= o 


lei 


c 


C 


C 


3 


3 


3 


3 


2 






CJ 

CO 


'5 


'5 


m 


jO 


to 


o 


'5 










a 


a 




CO 


a 


a 


S 




> 


*T3 


T3 


.£ 


.£ 


.£ 


-a 


-a 


o 








3 


c 


o 


*o 


o 


3 


3 


u 








0> 


<u 


a 


a 


a 












In 






*T3 


TD 


T3 






"O 










C 


3 


3 






IJ 

u 












































•n 






















3 


— 






















VJ 


CO 


















CA 

c 


S 

VJ 


i-t 

m 


















[C 


'§ 


#of 

chm 


LO 


CO 


o 


o 


O 


cn 


LO 


O 


'S 






















o 

u 


uo 

V. 


E 


















c 


vS, 




















o 












































(Q 


























T3 c 


















b 




q; 


















1^ 

o 

U 




c O 

CL* 


yes 


no 


no 


yes 


yes 


yes 


no 


no 


CA 

Vi 

t9 




QS 








































(Q 












































c 


























c 


















VM 




« o 


















o 




CO .M 


















CO 






l-H 


f-H 




LH 


in 


o 


o\ 


LH 


C 




m 01 


Ov 


Ov 


00 


00 


‘ 00 


00 






o 






















(O 




■*- u 


















*!•« 




C O 


















(Q 




U 


















a* 






















g 






















o 






















Table 11. C 




Block 


R23VAX5/PM 


R23VAX5/AM 


R23VAX7/AM 


RlVAXl/PM 


R23VAX7/PM 


R123VAX6/AM 


RlVAXl/AM 


R123VAX6/PM 



CO 




Appendix A 

Descriptive Graphic Rating Scales 




$ 

Q) 

f 

CO 

-c 

CO 

-S 

CO 

1 

t: 

CO 

§ 

a 

-2 

2 

CO 

•8 

CO 

CD 

i£ 

CD 

-Q 

CO 

2 

CO 

O 

CD 

.§ 

CO 

Z3 . 

'^1 

II 

CD O 

QC ctj 



CO 

c: 

.o 

o 

CO 

c 




o 

o 

a 

o 

CD* 

Cl 

CD 

sz 

(O 

CD 

c 



i2 

c 

CD 

£ 

o 

CD 

CD 

(O 

3 

C 

CD 

"D 

3 

o5 

CD 



~a 

’■o 

CD 



a 

CD 

5= 

CD 



5 

O 

X 



q> =: 

-I i 

Str 

^^P 





0 


c 

0 


$ 


0 


0 


0 


C 


0 


0 


X3 


E 


*o 


0-2 


3 


$00 


O 


0 0 O 


$ 


w^.-O 



~ i5 ^ — 
o c o o 

8 o 0.1 
>- *i= 3 o 
o >,Z Q- 

- ^ C/3 -« 
O O 03 CCJ 
Q. C m O 

CO — o 

C/3 *0 ^ O) 

(D ^ Q> P 
C Q> O) ^ 
:= > 3 </) 



I 

o 

- 

« 

S 

o 

CO 



c 

(D 

(U 

CO 

O 

I- 

o CO 

!i 

-1 
O $ 

(D 

S’ O 

CO -ts 

ll 



CD 
O 
CO 
Q. 
CO 

03 

> 

"co 

O) 

cu 

c 

5 o 

■« I 

c o 

CD O 



C^- 

CD 

O 

CD 

Cl 

CO 

O) 

c 

5 

2 

"D 

CD 



2 

O r-s 

Q> ^ 

2fc 

«■§ 
CD ^ 
CD p 
O 



c 

0 

0 

0 

0 

X3 

O 

C 

3 

O 

u. 

o 



o § 
^ S 

O CO 

g.^ 
« E 
■S p 



0 

O 

c 

0 

0 

•a 

3 

O 

X3 

0 

03 

C 



0 



■E i 



■if 

^ 3 
O CO 

S2 

0 O 

$ iS 



0 

(/) 

3 

C 

0 

“D 

3 

o5 

0 



“D 

0 

.> 

o 

0 

:t= 

0 

$ 

o 

X 

cvi 



^->S 

•* I 

o ^ 

.s> .-*- 

!fc " 
« "0 

bp 



C 

0 

C 

O 

O 

0 0 
0 o 

N 0 
0 Q. 

0 0 

E o 

“g>;l 

g c o 

l°i 

•= 0 o 

0 « O 

03 ^ -tr 
^ 0 C 
O > 0 

0 O S 

3 0^ 

.S2 t o 

> 0 U 



£So 




o 

o 



o i: 

.2 *3 
0«Q 
0- 

0 0 O o 



o _ 
0 <0 

§ £ 0 
0 



9- w 
o 
Q. 






^0^0 
^ o CD 0 

O.E 

igos 

^ ^ II 



2 

o 

0 



0 



"§ 

•4^ 

€ 



0 P 

o 



0 

0 

03 

0 

E 



c 

o 0 .Q 
‘•5 O .t2 
iS CO 0 






Q- O 
0 Q. 

0 E 
> o 

05^ 

« ? « 

2 = 3 

^ ® -O 

c ii2 9- 

c *5 5 

0 is ^ 



E 

o 

T3 

C 



CD 





2 

(I) 

f 

cn 

■S 

CO 

.2 

s 

CO 

Q> 

0 

■S 

a 

-2 

CO 

CO 

Q> 

£ 

Q> 

-Q 

CO 

-2 

2 

0) 

1 

Q> 

€ 

O) 

.5 

CO 

3 . 

-S 

'^1 

0 ) .y 

^ O 

Q) ^ 
*= 



^ c 

CO ^ 

QC ca 



CO 

c 

.o 

4^ 

CJ 

CO 

c 



c^- 

>» 

CO 

5 

CO 

3 

E 

p 



c 

o 

c 

CD 

CO 

5 

CO 

“D 

CO 



o 

o 



CD 

Q. 

CO 

sz 

CO 

cd“ 

c 



CO 

c 

CD 

E 

o 

CD 

CD 

CO 

3 

C 

CD 

“D 

3 

CO 

CD 



-g 

CD 

.> 

V-* 

o 

CD 

CD 

5 

o 

X 



II 

|- 

©.•S 



c 

CD 


CD 

5 


0) 


« 


(0 


c 


0) 


CD 


Si 


E 


TD 




3 


5 CD 


O 


CO CD 


$ 


w JO 




CO 



2 

o rs 

«■§ 

<B £ 

<B Cj 

'' 

o 



0^0° 
8 I |.g 
o ^r: ^ 

d* o c§ fO 

Q. c (5 

CTJ ,0 

^ W *s 

03 *0 CCJ O) 
* CO c 
<D ^ <D p 
C 0) (O ^ 
= > 3 <0 



c 

0) 

0) 

(O 

0) 

O 

c 

2 

3 

O 

$ 



(D 

O 

C 

<0 

<o 

3 

o 

^ <0 



o S m 
“ I C 

o CO .“ 

ff 

. 3 

0)^ 

= i i ^ 

• • • 



S i 5 

Q_'*— 
<0 p 

■s ^5, 



C^- 

CD 

O 

CO 

Q. 

CO 

O) 

E 

5 

CO 

“D 

CD 



CD 

CO 

3 

C 

CD 

“D 

3 

W 

CD 



“O 

TD 

CD 

.> 

%-* 

O 

CD 

tt: 

CD 

5 

o 

X 

cvi 



•C 9) 
O ^ 

9) 'p 

bP 



S 

o 

Q) 

^ <«*«• 
<0 c 

4m ^ 

CO Ci 

4m 

o 



0) 

c 

o 

o 

(o 0) 

0) o 

N CO 
<0 ^ 
^9i 

E 5 

® a-i 

g g « 



o 

Q. 

E 

o 



s ° 

C OJ - 

§ ^ ^ 
O) ^ 

59 i. 

T3 

<0 o ;ci 

3 S c 
o 

> (U o 



c 

0) 



(O 

0) 

O) 

CO 

E 



c: 

o 



O CD 
^ — 
iS CO CO 
« Q- o 

1^ CO Q. 

o o E 

C^> O 

0 CO _ 

■o S?^ 

c 3> CO 

a = s 

1 



O) 



o 

ERIC 




£ 

(i> 

CO 

-c 

CO 

•S 

CO 

s 

t: 

5 

CQ 

CD 

0 

•S 

a 

-2 

CQ 

■§ 

CtJ 

CD 

.2 

I: 

CD 

CO 

•5? 

2 

0) 

1 

CD 

€ 

CD 

.5 

CO 

C3 . 
^ .£ 

§3 

c I 
■§ o 

CD^ 
€ *= 

CO ^ 

QC (0 



0) 

.§ 

I 

CO 

c 



o- 

Q. 

iS 

k. 

o 

> 

o 

TJ 

C 

(0 

CO 

(D 

Q. 

(C 

sz 

CO 

T3 

C 

(C 



(0 

(D 

c 



CO 

CD 

T3 

C 

(D 

“D 

13 

CO 



CD 



CO 

CD 

O 

T3 

CD 

> 

O 

(D 

CD 



5 

o 

X 



o 

0) 

!fc 

« 

§ 



T3 

c 

g 

O) 

0) 

•O 

■g 

E 



CO 

(0 

(D 

N 



•o 

C 



E i 
S .i 

OJ-O'O 

2 i 5 



S’ Q- 

Si m 
(D c5 
-c g 



■E.g 

is 

O 



£ o27.^ 

(0 o> W 
co-giG 

•E to s 

CO -Q g 

l?l 

O (0 <D 



0) 



CO 

x: 

5 

o 



o- 

T3 

CD 

O 

CO 



C 

CD 

(D 

JO 

• CO 
CO 
x: 

To 

E 

c 

CO 

(D 

£ 

>* 

CO 

5 

CD 



B 

"co 

cn 

CD C-* 
£ c 

«=E 

•^1 
■« 2 
o •> 

o 

o 



CD 

(D 



Q> 

O 

;S 

Q> 



CO 

CO 



£ 

•D 

X3 



<D 

■o 

Q> 

O 

0} 



i ^ 

> -o 

^ J £ 

to 



o> 

2 -a CO 
5 C CO 

5 <D 
to 2 N 

0) « o 

<D -o -2 

CO c 3 
2 to c 

Q.-0 C 
^ 3 ^ 



Es 

82 

O »- 
CO 2 

o 

£ O 

"O CD 
E to 

i "" 

^ OJC O i 



~ S2 = 

C 



2 c 

,^-D 

o B 
o S® 

x: 2 

§1 

TJ C 

2^'5 

o «j 
•> o 

o£ 

:g « 

; c 
5 © 

o X 

X (D 



CM 



O 

<1> 



••a ^ ® 8 I 

<D 

§ 



c 

QD 

E 

c 

g 

*> 

c 

CD 

0 
0) 

01 



c 

<D 

E 

tu ^ 

CO Q 

Q. CO 

®E 

to o 

2| 
o .E 



T3 to 

ii* 

2 $ 
0)5 
to ^ 

*5 tD 
CD •£ 

O) (D 

o E ® 

Cl CD O) 

w "o5 £ 



o 



CD 

E Ci 
{5 .O) 

^ CO 
CD ^ 

-o ^ 
to 2 



CO 



CD 



c O 

CD CD 

e| 

2 c 

-Sl^ 

® ’§ lO 

S "Jo 

2 I 

*0 CO 

c 

to _ 
c ^ 
o 5 o 

^ E 

c E 

o CD 
O 

.to <0 

S o- 

CO 

«C CD 

Q-£ 

Er 

CD CD 

c !2 
to w 

£ O 

> c 

CD CD 

o)£ 

c « 

CO CD 

O E 

> to 

£ Q. 

S i2 

O CO 
o > 

CD 

*a o 

« §L 

E 52 

s g. 



o 

sS 

Q> 



O) 

.E 

0) 

W tD 

£ 

■o E o 

a*0):= 

a .E-o 
« to i 
^cS 2 

O) 



CD 

o ^ 

^ O) 

o g>lE 

a $ ^ 

^ CO CD -» 

|-o2 £ 

C3 2 2 2 

.5sf Jgg 

C-O 

o CO 



E 



CD 

O) 



^ lo <b ^ 

•j- O 2 g 

O ^ k- 



"tS C to m 

2 .S 2 E 

w E o)-— 

3 CD O tl) 

= CD C £ 



CO ^ 
£ (0 

E.i 

CO 
CO c 
CD CD 

"o 

CO *0 

2 CD 

Q. CD 
Q- ^ 
CO £ 



£ 

O 

Z 




O 

lO 



lO 




$ 

Q> 

CO 

i5 

CO 

I 

§ 

CO 

Q> 

o 

i5 

a 

CO 

CO 

0> 

£ 

si 

0) 

CO 

-2 

CO 

0 

CO 

1 

•*M 

Q> 

:S 

Oi 

.c 

§ . 
«.s 

5 I 

^ o 

Q> ® 

a C 

CO ^ 
QC CO 



CO 

c 

.O 

o 

CO 

c 



o- 

Q. 

iS 

k- 

d> 

> 

o 

*o 

c: 

CO 

CO 

d> 

Q. 

(0 

CO 

3 

x> 

c 

CO 



CO 

o 

c: 



CO 

0) 

*o 

c 

CD 

T3 

3 

00 

CD 



CO 

CD 

O 

“O 

CD 

.> 

O 

(D 

(D 

$ 

O 

X 



•C 

o 

CD 

jS 

« 



§ 



*D 

C 

3 

O 

k_ 

o> 

o 

*a 

•g 

E 



T3 

C 



••5 

Q 



CO 

f 

Ql 

E 

O 

CO 



c 
E 

a>-D ^ 

2 i g 

<0 O) 
to -Q 2 

l?l 

U (0 0) 



3 

o 



0} 

<5 ^ 

E <0 

15 ^ 

ElS 

c 2 

<0 :£ 



0 
*o 
*g 
E 

tS 
c 

2 c 

<i> 2 ^ 

£ 

^ <j c 
(/) <0 — 

<D -Q T3 
O) w- C 

2| 

o 

w *g 

*5) 

0) 



■o 

c 

3 

2£ 
O) Q. 

IS 

® 

(u (5 

■c .9 

■sl 
i 2 

O Q. 

o o 

2-5 

li 



9 



<D 



C o> 



« <0 g 
- ”0 
5 2 g 
P <D <0 

0)£ Q. 



CO 

5 

o 

I- 

C“ 

*o 

CD 

O 

CO 



c 

CD 

CD 

\a 

CO 

CO 



CO 

E 

c: 

CO 

(D 



>> 

CO 

5 

(D 



O 

c3 

O) 

o o. 
c c 

E 

2 

o •> 

o 
o 



0 

o 



CO 

CO 

o 

c 



o 

■O 

■D 



■o 

c 

3 

O 



<D 

T3 

0> 

<J 

0) 



c 

<D 

E 

O) <D 

2 "o 0 ) 

o c <0 
0> 
N 
O)‘o) 

a> o o 
<-> S <- 
o 

a> -o .= 

« C 3 

2 <0 c 

0-T3 c 

^ 3 ^ 

S 2 o 

i5 o) c 



I -= 
? ”2 

;S 

0 



Q. 
<D 

■o 

-2 
<5 
o 

82 

m § 

o 

® J- 

2 o 

"O ® 

E ® 

CO 3 

|2 



• 2.S 

5-0 
o ® 
0) ® 
£ 2 

«■« 

8| 
*o E 

o ® 

> <D 

tS£ 

® c 
5 ® 
o •$< 
X 0 



CJ 



•C 

o 

jS 

tt 

b 

§ 



- i c 5 



0 



c -o *w 
c O) 



5 

••5 

<D 



CO 

f 

E 

o 

CO 



••5 

Q 

£ 

0 



<0 

E 

E rr 

® -5 g f 

o c c 5 

— ® cn 2 

® o ^ 

E <D "o £ 

S S-® ~ 

® S c E 

Q-^®w_ 
® c ” c ® 

0> CD ^ 

2 o 1 1 g, 
c5 .£ o5 CD 2 



CD -S 



CO 

u d> 

> 

o V 5 a> 

^ CO c 

C P := 

g O-D 

its 

P *& g 



CO .2^ 



Q.«C 

^ O) ® 

2£-5 

■g ® 
® c c 

2^2 



o y' (D 

E 0 E 

O JC o 

(/D C/D 



O) 

c 

'2 
” ' 

£ 

T3 



P 



CO o>— 

C *n 
m .— y 



2 



CO 3 
O O 

lo i ^ 
— d> o 
*“ c 



0> 

o ^ 

>. — 



3 

O O) 

-O.E 

CD $ 

^ s 
1”° 
CO 

•“ c 
c — 
o w 
c 



2 0 



B 2 2 ^ 

O ^ m 

c 



lo E 

_3 ^ 
= <D 



(D 



5 c 

55 .s> 

^ CO 

0) ^ 
JC 

2 5 
0> 
<2 s 
c o 

CD Q) 

e1 

c 

s s 
<0 _ 
<i> <a 

E 

g| 

"O CO 

s ® 

® JC 

E ^ 

ti 

§i 

O 

.2 CO 
m ®- 

co 

JC CD 
Q-£ 

Er 

<D (D 
C 

CO « 
£ O 

c 

0 (D 

CO CD 



0) 



<J 



> ® 
Q. 

CO ® 



o 

<J 
CD : 

•a 

CO 



CO 



CD 

Q. 

CO 



CD> 

c 

I 9> 

T3 O 

® E 
£ 2 
O (D 
w p 

s ^ 

— CD 
•St 05 

2i 

OJ- 

o ® 

C £ 



2 g. 

« o 

is CO 

^.i 

<0 

€.i 

2i 

Q. CD 
CO £ 
c ® 

8° 

3 C 
00 .2 
<D 75 

£ Q- 
^ ® 
— -o 

B 

o 

z 



C\J 

iO 



o 

ERIC 



0) 



c 

0) 

T 5 

CO 



CO 

k- 

tr 

o 

Q. 

0) 

CO 



in 



3 « 



> 

CO 

CM 

QC 

o 

_o 

CD 



0) 

CO 



o 

c 5 

cr 



CD 

CO 

x: 

CO 

•S 

CO 

1 

*6 

s 

CO 

CD 

CJ 

•S 

a 

-2 

2 

CO 

■s 

CO 

CD 

i£ 

CD 

CO 

-2 

2 

to 

i 

<b 

t 

S . 

^ CD 

|5 
-2 1 
§ .§ 
SI 

2?^ 
^ 5 
QC CO 

CO 

c 

.o 

•A 

o 

CO 

c 



2 

5 C 

ii 

Is 

^■£ 



C^- 

’co 

tr 

o 

cx 

0) 

(O 

^(O 

c 

0> 

TJ 

-3 

« 

0> 



.2 

CD 

.> 

'co 

CO 

2 

cx 

X 

CD 

$ 

o 

X 



s 

••ii* 

CO 

CO 

^2 
w •§ 
«p 

4 m 

O 



OT 

<D 



<0 

3 

cr 

(D 

> 

<n 

u> 

0) *0 

H 

<D E 

OT CT 

5.E 



>* CO 
CO § 
O CO 



CO 

.E 

lo 

3 

cr 

CD 

> 

CO 

CO 

CD 

Q-'D 
O 
O o 

If 



x: 

CO 



o 

c 

CO 

CD 

O 

TD 



CD 

O 

CO 

o 

(0 



i2 

c 



CD 



(D 



is 

2 0) 

3 £ 

2 cc 

®o 

> CD 
CO « 

CO 5 
(u O 

S °- 

Q. CO 
X CD 

CD I- 

CD C 

x: ® 

£ ° 
o> w 

C £ 

Si 

CO rt= 
cd"c^ 

TD Oi'O 

CO ^ CO 

CO 8 2 
CO £ 

o N 



C 

o 



I 

.£ ig c: 
It= CD 
.CO CD Q. 

C CD 
CD > C 

■S *•5 o 

CO c 

CD C ® 

£ CO 

£ ™ 2 
o^_ fc 

;ro ® 

oZS 

"0 -C 
^ O 

CD 
CD 

CO « 
C CO 

o c 
Q- O 
CO o 
CD C 



(D 

P 
3 
O 

sz 

CO 

CO c 
CD 

CO £ CD 
C c 

O ^ 

Q- > 

CO *tr 
CD O 



.CO 



CO 

c 

O 

Q. 

CO 

CD 



^ <13 
^ ^ Z 
g C £ 

X: (D CD 
‘C CO 

g 

O 



0-- 

c 

g 

'<D 

0) 

Q. 

X 

(D 

>s 

(D 

> 

C 

o 

o 

o 

CO 

(D 

3 

O’ 

*c 

o 

(D 

TJ 

C 

CO 

JW 

.2 

(D 

00 

E 

(D 

CO 

3 

C 

(D 

TJ 

3 

CO 

(D 



“D 

>s 






■o 

CD 

8 > 
CD g CD 



■o 

g 

ts 

-9? c 



Q) S 
O .t- 



^ 8 _ 
8 g 8 



CD 

CO 



:a 



*C 3 — 



(0 CD 

E-a 



^ 5 ^ o CD g 



I 



CO 

CO 

CD 

o 

o 

Z3 

CO 

5 

o 

X 



CM — 



5 

<0 C 

CO ; 

CD 

o 

o 

CO 



CQ 



g g 

xj £ XI 
CO *5 CO 
CO ^ CO 
CO C (0 

5 .9 $ 

^ CO ^ 
C CO c 
CD CD CD 

3 

CO CD CO 



CD 

CU m 
CO > g 
CD ^ ^ 

g-^ ” 
C. 9 - c 
■g x: CO 

£ ^ co- 

®S 3 
£ ® CT 
^ CD C 



S 2 

« 

^ o 

X ^ 
CD CO 



<0 



g 

*C0 

E 



■o 

CD 

o 

g 

^ <1^ 

> g 

S <5 

O w 
O 0) 

o CO 
® 2 
xa <D 
C0£ 



■O 

g 

a 

g 

CD 

CO 

S?' 

CD CD 
3 



> 

o-p- 

o c 

go 

CD 

(D 

xa CD 

CO -E 



c 

CD 

CD 

CD ^ 

xa w 

ca.X3 

c 

w 

C CO 

g CD 
(0 cr 



0^0^ £ g 



§ 





o 




O ^ 


a 


c 




c 






CO 


$ 


CO 


$ 


c 


CO 


C 


(0 


c 


.•Q 


$ 


g 




o 


p 


c 


CO 

CO 


c 


CO 

CO 




CD 


CD 


CD 


CD 




“O 

13 


v_ 

Q. 


T3 

3 


Q. 






X 




X 




« 


CD 


CO 


CD 




• 




• 





o 

CD 



W W 

»- n »- o o 



CO 
3 
O 
O co” 

C .2 

8 za 

O 2 

c E 



LO 



*2^ 

iO 



o 

ERIC 




§■? 



Q) 

CO 

2 

I 

<u 

S 

o 

CO 



2 

(0 

£rs 

2-' 
iC o 
<u •§ 

«l 



o 



CO 

0) 



CO 

3 

O" 

<1> 

> 

’<o 

CO -Q 

£ o 

CL O 

X E 

CD 

CO ^ 

O <D 
JC <i> 
CO »*- 
>» CO 

■n > 
CO o 

o CO 



CO 

g> 

lo 

(D O 

> E 
w ra 

2 J 

Q. 0) 
X 0} 

o •m 



1 2 

® ffl 

S| 

£ <u 

3 £ 

5 DC 

So 

> CD 
W « 

S 

2 8 . 

Q.« 

CD 



CD 



CD C 

li 

Z o 

£ C 
O) CO 
CO 

li 

CO 



(D 



c 

CD 

T3 

2 



CO CO 


•O 




3 


0? 0? 




0 0 


(O 


O) O) 


0 


O) O) 


JC 


3 3 




0 0 








• • 


0 




o 




T“ 




o 








T3 




0 


CO 


0 


.0 


^ 3 




0 


lo 


.O 




T3 


a* 


3 


0 


O 


.> 


JC 


‘w 


in 


CO 


in 


2 

0-T3 
X o 

0 o 


0 
in 
c • 
o 

Q. 


$ E 


<n * 




2 


CO .2 


c 


K 0 


0 


0 CO 
0 


5; 


o 2 

*D 2 


0 




o 


• • 


2 



^4 

O 

■O O) CD 

CO *c 
• o CD 
« a N 

C CO = 
O (- CO 

c o g 

2 tJ *=^ 

c CD ♦- 
J» ^ o 

C CD ^ 
CD > C 

“ -S? 

05 

d> • 

«= S. 

^ CJ) 



z $ 

1 c 

02 
-c W 

CO <0 

§ 
8 O 

o c 

Q. 

CO 



<1> 



o C 

si 

§ $ 
^ CD 
<£ 



C^- 

c 

o 

’co 

(0 

a> 

Q. 

X 

0 

>> 

( 1 ) 

> 

c 

o 

a 



CO 

0) 

D 

S’ 

'c 

jc 

o 

s 

T5 

C 

CO 

JO 

CO 

*k. 

0 

c5* 

E 

o 

0 

D 

C 

0 

T3 

13 

00 

0 



.CO ^ 

CD 
CO 

c 
o 

Q. 

CO 
CD 



*5 

5k 



</} 

</) 

0 

O 

o 

</) 

5 

o 

X 

cvi 



iO 



CO S> 
<0 ^ 
0 

o tr 

S.’S 

:2 



T3 
O 
^ O 
>% 0) 
CD — 
> 0 
C 0 
O CO 

o CO 

•a E 

O 0 
.c 
o 

5 2 

CO 

CO ^ 

CO c 
$ .2 
^ CO 
C CO 
CD 0 



CO CD 



>> 

> CO 

s ® 

o 3 
a o- 



CD 



-2 JZ 
J3 ^ 

CO *5 
CO 

0 5 
$ .2^ 

CD 0 O 

Q. ® 
2 X CD 
CO CD CO 



CD 



•s ™ 

1 ® 

<0 CT 

2 C 

_ JZ 

o 

C 0 
O 

i| 

-I 

CO c 







*§ 



0 
0 
0 
O 
O 

0 3- 

^■§ 
«C Q 

^ V 

0 

o 



1 

I 

3k 0 

O 0 
> CO 

§1 

O § 

2 15 

o E 

X) CD 
0 £ 

0 JC 

1 i 

CO 

0 o 
■§ ^ 
^ g 



T3 

O 

O 

-2 

CD 

CO 

5(2 

CD 0 
> 3 
c o* 
O c 
<J.C 
o o 

•H- CD 

CD ♦- 

S 0 

CO £ 

o£ 

CO c 
$ .2 
^ CO 
C CO 
CD 0 

2 X 
CO 0 



c 

0 

0 

5 CD 

i-i 

M (0 

.2 o 
iS S- 
2E 
«o 
|2 

c CO 

o 

o 0 

O 0 

C E 



CD 

IC’ 






0) 

:E 

C3) 

V) 

qj -bi 

SI 






c: 

CO ^ 

QZ CO 



CO 

.§ 

o 

CO 

c 



c^- 

E 

o 

*♦— 

T 3 

C 

CO 

Q. 

0 ) 

O 

c 

o 

o 



CO 

“D 

c 

iS 

o 5 

c 

o 

c 

co" 

,g 

c 

0 ) 

O) 

.E 

CD 

.> 

c 

0 ) 

> 

c 



CD 



CO 

c 

’o> 



X3 

CD 

CO 

CO 

CD 

Q. 

X 

CD 

CO 

CO 

(D 

;g 

CD 

£ 

0 ) 

CO 

c 

© 

X 
CD 

CO 

XI 
$ 
o 
F- 



co 



1b 

O c 5 



•S CO > -= o 

o'-E 



t 

o 

s 



g|iil 

> c X .'tz 
3 0 ) "O 



® C- 

Q-.E 



CO 

c 

•I 

o 



CQ 

I 

0) 

5 

o 

(0 



CQ 

.c 

■I 

O 



CQ 

CQ 

O 

c 



Q> 

E 

o 

(O 

x: 



■o 

0) 

(O 

(O 

£ 

Q. 

X 

Q> 



x: 

.y 

o 



<D 

c 

o 

c 

0) 

(O 

X 

0) 

c 

o 

c 

CO 

"O 

c 

CO 



0) 

3 

O) 

5 



^ CO 
CO .-^x: 

55 C 0) 

E o> E 
©Co 
0 . 0(0 



© 
© 



o 

o> 



© 

o 

C CO 

o 5 ic 
$ i2 



KD 



o 



c 

© 

> c 
o o 

E .5 

© Z- c 
o © © 

05 X: )r 

5 

> 0 ) g 

o o o 

> © c 



o- 

O) 

.E 

c 

© 

© 

E 

T3 

© 

T3 

c 

© 



© 

© 

> 

c 

o 

o 

© 

E 

o 

««» 

© 

o 

Q 



0 - 

TJ 

© 

© 

© 

© 

Q. 

X 

© 

O) 

E 

*© 

© 

© 

© 

;g 

© 

x: 



© 

E 

o 

«*«■ 

© 

£ 

© 

© 

© 

,© 

Q. 

O 

Q. 

Q. 

© 

$ 

O 

X 

cvi 



£ 

I 

I 

r 



© 

I 

© 

S 

o 

© 



T3 E 

.c J 2 

C © -ti 

g 2 i 

E ® 

« -5 S 

m ^ u 
^ 0) ^ 

.E ^ © 

|l€' 



© 

o 

g 

'c 

(U 

O) 

c 

2 

© 

© 

© 

(D 



s .<2 © 

go © 



_J ^ (I) 

o -S 2 TO O C 

O o © C CD ^ 

Q 2 © ^ g c 

^ © c © c c 

^x: © c o 

© O Q. O «S O 



in 



2 k 

© 

E 

© 2 
© 

C 

§■ 
a 
§■ 



c 

2 

§. 

Q. 

© 



© 

ll 
1 = 
© o 

i® 
•i ^ 
© > 



© 

B 

0 

1 i 

®sl 

i>o 

S-i 

CO *S 



li 

ii 

© *D 



CD 

© 

O 

fr 



© 


> 

© © 




.© 


© E 
*“ 0 






eg $ 

(D © 
© X2 ^ 
3 © ^ 
C3TD 2k 




§■ 

a 




a 

(Q 


© © -c 
> © © 


00 




10 




© 0 c 


0 


> c Q. 




C 


• • • 






Block: R23VAX7/AM (Miseries and Hope) 

Student ID: 



2 

Q> 

cn 

•i: 

CO 

iS 

CO 

I 

*6 

CC 

CD 

O 

iS 

a 

-2 

CD 

■§ 

CD 

CD 

.£ 



CD 

-Q 

CO 

CD 

2 

CO 

I 

CD 

S 

.c 

CO 

:3 



CD 
, .c: 

c 
c S 

P 

03 ^ 

CQ 5 
QC CD 



CO 

c 

.O 

O 

CO 

c 



o- 

E 



“D 

C 

CO 

Q. 

0 

CJ 

C 

o 

CJ 



0 

“D 

C 

s 

o5 

c 

o 

c 

(n 

,o 

'c 

0 

O) 

.E 

0 '' 

> 

'•4^ 

c 

0 

> 

c 



0 



0 

c 

*0) 



“D 

0 

0 

0 

0 

Q. 

X 

0 

0 

0 

0 

;g 

0 



£ 

0 

"c 

0 

X 

0 



0 

x: 

5 

o 



0 

.c 

.5) 

o 

b 



2 > 
o C 

</) <D 

0 > 
cl.E 





4 ^ 




.9 ,<5 


0 




3 tr 0 


0 

3 




C. 


X rt: 


3 


0 T) 



0 

,c 

.5) 

o 



0 

0 

o 

c 



c 

0 

> 

0 



0 ^ 
3 0 
O)^ 
0 ^ 
»0 ^ 0 

■g 

:= 0 O 
O > 0 




O- 

O) 

.£ 

'c 

0 

0 

E 



“D 

0 

“D 

C 

0 

*£ 



0 



0 

> 

c 

o 

CJ 

0 

E 

o 

0 



o 

Q 



o- 

“D 

0 

0 

0 

0 

Q. 

X 

0 

CD 

C 

■0 

0 

0 

0 

;g 

0 



0 

E 

o 

0 



0 

0 

0 

0 

'l. 

CD. 

O 

CL 

CL 

0 



5 

o 

X 



<N 



5> 







*o E 
.i*! 2 

'c 0 

22 § 

S -2 T 3 

i.y-g - 

II £ I 

S 0 0 O 
0 O CO S 
0 ) 2 0 
0 C 0 

0 c 



0 

3 

O 

c 

0 

O) 

c 

£ 

0 

0 

0 

0 

TJ 

TJ o 
C 4^ 

« s 

M }= 



°8 



V=1 

CO 



0 

0 

o 

Q. 

3 

OL 

O 



a 

.2 

a 

o 

§: 

0 



o 

c 



c: 

0 

> 

0) 0 

0 *P 



o 

5 



_0 0 
0^0 
3 0 3 
g)*o 5 k 

0 0 *c 

> 0 os 

0 o c 

> c OL 



o 

Co 



er|c 



AERA April 8-12, 1996 





1x^6] 

3||k-jzj 


U.S. DEPARTMENT OF EDUCATION 


VI 




Office of Educational Research and Improvement (OERi) 
Educational Resources Information Center (ERIC) 


ERIC 


1 * 












(Specific Document) 



1. DOCUMENT IDENTIFICATION: 




^Ortst■^uc-V.^^ ■ Osi'og tc? o f c(jLScrl^34i,)t, 


Author(s) M. ^ H ‘lar^ RltsW*- 




Corporate Source: 


Publication Date: 


tr4^U.C«iJt»iorv<uL S^ovcjt-* 


I9^<- 



II. REPRODUCTION RELEASE: 



In order to disseminate as widely as possible timely and significant materials of interest to the educational community, documents 
announced in the monthly abstract journal of the ERIC system. Resources in Education (RIE). are usually made available to users 
in microfiche, reproduced paper copy, and electronic/optical media, and sold through the ERIC Document Reproduction Service 
(EDRS) or other ERIC vendors. Credit is given to the source of each document, and. if reproduction release is granted, one of 
the following notices is affixed to the document. 

If permission is granted to reproduce the identified document, please CHECK ONE of the following options and sign the release 
below. 



X 



Check here 

Permitting 

microfiche 

(4”x 6” film). 

paper copy, 

electronic. 

and optical media 

reproduction 



Sample sticker to be affixed to document Sample sticker to be affixed to document 



"PERMISSION TO REPRODUCE THIS 




"PERMISSION TO REPRODUCE THIS 


MATERIAL HAS BEEN GRANTED BY 




MATERIAL IN OTHER THAN PAPER 






COPY HAS BEEN GRANTED BY 








TO THE EDUCATIONAL RESOURCES 






INFORMATION CENTER (ERIC)." 




TO THE EDUCATIONAL RESOURCES 






INFORMATION CENTER (ERIC)." 



Level 1 



Level 2 



or here 

Permitting 
reproduction 
in other than 
paper copy. 



Sign Here, Please 

Documents will be processed as indicated provided reproduction quality permits. If permission to reproduce is granted, but 
neither box is checked, documents will be processed at Level 1. 



"1 hereby grant to the Educational Resources Information Center (ERIC) nonexclusive permission to reproduce this document as 
indicated above. Reproduction from the ERIC microfiche or electronic/optical media by persons other than ERIC employees and its 
system contractors requires permission from the copyright holder. Exception is made for non-profit reproduction by libraries and other 
service agencies to satisfy information needs of educators in response to discrete inquiries.” 




Position: 

Ke:sm(Z£.fr S2if/ur?sr 


Printed Name/^ 


Organization: 

er^ 


Address: ^ ■Sflz.u.cc 

/U3" oJiTV/ 


Telephone Number: 


Date: / j 

sfs. fZ-pf, 



CUA 




THE CATHOLIC UNIVERSITY OF AMERICA 

Department of Education, O’ Boyle Hall 
Washington, DC 20064 
202 319-5120 

February 27, 1996 
Dear AERA Presenter, 

Congratulations on being a presenter at AERA'. The ERIC Clearinghouse on Assessment and 
Evaluation invites you to contribute to the ERIC database by providing us with a written copy of 
your presentation. 

Abstracts of papers accepted by ERIC appear in Resources in Education (RIE) and are announced 
to over 5,000 organizations. The inclusion of your work makes it readily available to other 
researchers, provides a permanent archive, and enhances the quality of RIE. Abstracts of your 
contribution will be accessible through the printed and electronic versions of RIE. The paper will 
be available through the microfiche collections that are housed at libraries around the world and 
through the ERIC Document Reproduction Service. 

We are gathering all the papers from the AERA Conference. We will route your paper to the 
appropriate clearinghouse. You will be notified if your paper meets ERIC's criteria for inclusion 
in RIE: contribution to education, timeliness, relevance, methodology, effectiveness of 
presentation, and reproduction quality. 

Please sign the Reproduction Release Form on the back of this letter and include it with two copies 
of your paper. The Release Form gives ERIC permission to make and distribute copies of your 
paper. It does not preclude you from publishing your work. You can drop off the copies of your 
paper and Reproduction Release Form at the ERIC booth (23) or mail to our attention at the 
address below. Please feel free to copy the form for future or additional submissions. 

Mail to: AERA 1996/ERIC Acquisitions 

The Catholic University of America 
O'Boyle Hall, Room 210 
Washington, DC 20064 

This year ERIC/AE is making a Searchable Conference Program available on the AERA web 
page (http://tikkun.ed.asu.edu/aera/). Check it out! 




Director, ERIC/AE 



'If you are an AERA chair or discussant, please save this form for future use. 



jEHICl Clearinghouse on Assessment and Evaluation 



